Due date: Monday 10/18, end of day
This assignment will contain two parts:
In this assignment, we'll explore spatial trends evictions in Philadelphia using data from the Eviction Lab and building code violations using data from OpenDataPhilly.
We'll be exploring the idea that evictions can occur as retaliation against renters for reporting code violations. Spatial correlations between evictions and code violations from the City's Licenses and Inspections department can offer some insight into this question.
A couple of interesting background readings:
The Eviction Lab built the first national database for evictions. If you aren't familiar with the project, you can explore their website: https://evictionlab.org/
geopandas¶The first step is to read the eviction data by census tract using geopandas. The data for all of Pennsylvania by census tract is available in the data/ folder in a GeoJSON format.
Load the data file "PA-tracts.geojson" using geopandas
Note: If you'd like to see all columns in the data frame, you can increase the max number of columns using pandas display options:
import matplotlib
import pandas as pd
import geopandas as gpd
import hvplot.pandas
import geoviews as gv
import geoviews.tile_sources as gvts
import holoviews as hv
import numpy as np
import rasterio as rio
import contextily as cx
hv.extension('bokeh', 'matplotlib')
from matplotlib import pyplot as plt
from holoviews import opts
from rasterio.mask import mask
from rasterstats import zonal_stats
from mpl_toolkits.axes_grid1 import make_axes_locatable
%matplotlib inline
pd.options.display.max_columns = 999
pd.options.display.max_rows = 999
evPA = gpd.read_file("./data/PA-tracts.geojson")
We will need to trim data to Philadelphia only. Take a look at the data dictionary for the descriptions of the various columns in top-level repository folder: eviction_lab_data_dictionary.txt
Note: the column names are shortened — see the end of the above file for the abbreviations. The numbers at the end of the columns indicate the years. For example, e-16 is the number of evictions in 2016.
Take a look at the individual columns and trim to census tracts in Philadelphia. (Hint: Philadelphia is both a city and a county).
evphilly = evPA.loc[(evPA['pl']=="Philadelphia County, Pennsylvania")]
For this assignment, we are interested in the number of evictions by census tract for various years. Right now, each year has it's own column, so it will be easiest to transform to a tidy format.
Use the pd.melt() function to transform the eviction data into tidy format, using the number of evictions from 2003 to 2016.
The tidy data frame should have four columns: GEOID, geometry, a column holding the number of evictions, and a column telling you what the name of the original column was for that value.
Hints:
GEOID and geometry columns as the id_vars. This will keep track of the census tract information. value_vars.value_vars = ['e-{:02d}'.format(x) for x in range(3, 17)]
Trim to specific columns:
needcolumns = ['GEOID','geometry']
value_vars = ['e-{:02d}'.format(x) for x in range(3, 17)]
needcolumns.extend(value_vars)
Data Melting:
evphilly = evphilly[needcolumns]
value_vars = ['e-{:02d}'.format(x) for x in range(3, 17)]
mlt_evphilly = evphilly.melt(
id_vars=['GEOID','geometry'],
value_vars=value_vars,
var_name='year',
value_name='evictions')
Use hvplot to plot the total number of evictions from 2003 to 2016. You will first need to perform a group by operation and sum up the total number of evictions for all census tracts, and then use hvplot() to make your plot.
You can use any type of hvplot chart you'd like to show the trend in number of evictions over time.
group1_evphilly = mlt_evphilly
group1_evphilly = group1_evphilly.groupby('year',as_index=True)['evictions'].sum()
bar1=group1_evphilly.hvplot(
kind='bar',
color='#FE5C5A',
hover_color = '#FFBA48',
line_color = '#FE5C5A',
rot=90,
width=800,
height=300,
title='The total number of evictions per year from 2003 to 2016',
xlabel='Year',
ylabel ='Total Number',
ylim=tuple([0,16000]),
attr_labels=True)
dot1=group1_evphilly.hvplot(
kind='scatter',
color='#31FFCE',
size = 50,
hover_color = '#FFBA48',
rot=90,
width=800,
height=300,
attr_labels=True)
line1=group1_evphilly.hvplot(
kind='line',
color='#27CCA4',
width=800,
height=300)
final_chart = bar1 * line1 * dot1
final_chart.opts(bgcolor='#2C2C5C')
Our tidy data frame is still a GeoDataFrame with a geometry column, so we can visualize the number of evictions for all census tracts.
Use hvplot() to generate a choropleth showing the number of evictions for a specified year, with a widget dropdown to select a given year (or variable name, e.g., e-16, e-15, etc).
Hints
groupby keyword to tell hvplot to make a series of maps, with a widget to select between them.dynamic=False as a keyword argument to the hvplot() function. width and height that makes your output map (roughly) square to limit distortionsmlt_evphilly
group_evphilly = mlt_evphilly
geo_evphilly = mlt_evphilly.loc[(mlt_evphilly['year']=="e-03")]
geo_evphilly = geo_evphilly.drop(['year','evictions'],axis=1)
group_evphilly = group_evphilly.groupby(['year','GEOID'],as_index=False).sum('evictions')
group_evphilly = group_evphilly.merge(geo_evphilly, on='GEOID')
hv.output(widget_location='bottom')
choro = group_evphilly.hvplot(c='evictions',
frame_width=600,
frame_height=600,
groupby = 'year',
alpha=0.65,
geo=True,
cmap='viridis',
hover_cols=['GEOID'],
dynamic=False,
)
total = gvts.Wikipedia * choro
total.opts(
opts.WMTS(width=100, height=100, xaxis=None, yaxis=None),
opts.Overlay(title="The number of evictions across Philadelphia"))
Next, we'll explore data for code violations from the Licenses and Inspections Department of Philadelphia to look for potential correlations with the number of evictions.
L+I violation data for years including 2012 through 2016 (inclusive) is provided in a CSV format in the "data/" folder.
Load the data using pandas and convert to a GeoDataFrame.
LViolation = pd.read_csv("./data/li_violations.csv")
LViolation['geometry'] = gpd.points_from_xy(LViolation['lng'], LViolation['lat'])
LViolation = gpd.GeoDataFrame(LViolation, geometry='geometry', crs="EPSG:4326")
There are many different types of code violations (running the nunique() function on the violationdescription column will extract all of the unique ones). More information on different types of violations can be found on the City's website.
Below, I've selected 15 types of violations that deal with property maintenance and licensing issues. We'll focus on these violations. The goal is to see if these kinds of violations are correlated spatially with the number of evictions in a given area.
Use the list of violations given to trim your data set to only include these types.
violation_types = [
"INT-PLMBG MAINT FIXTURES-RES",
"INT S-CEILING REPAIR/MAINT SAN",
"PLUMBING SYSTEMS-GENERAL",
"CO DETECTOR NEEDED",
"INTERIOR SURFACES",
"EXT S-ROOF REPAIR",
"ELEC-RECEPTABLE DEFECTIVE-RES",
"INT S-FLOOR REPAIR",
"DRAINAGE-MAIN DRAIN REPAIR-RES",
"DRAINAGE-DOWNSPOUT REPR/REPLC",
"LIGHT FIXTURE DEFECTIVE-RES",
"LICENSE-RES SFD/2FD",
"ELECTRICAL -HAZARD",
"VACANT PROPERTIES-GENERAL",
"INT-PLMBG FIXTURES-RES",
]
slc_LViolation = LViolation.loc[LViolation['violationdescription'].isin(violation_types)]
The code violation data is point data. We can get a quick look at the geographic distribution using matplotlib and the hexbin() function. Make a hex bin map of the code violations and overlay the census tract outlines.
Hints:
slc_LViolation
hv.output(widget_location='bottom')
hex1 = slc_LViolation.hvplot.hexbin(
frame_width=600,
frame_height=600,
x='lng',
y='lat',
groupby = "violationdescription",
logz=True,
geo=True,
gridsize=40,
cmap='viridis',
dynamic=False)
boundary1 = geo_evphilly.hvplot.polygons(
geo=True,
alpha=0,
line_alpha=0.5,
line_width =1,
line_color="black",
hover=False,
width=600,
height=600,)
combination = gvts.Wikipedia*hex1 * boundary1
combination.opts(
opts.WMTS(width=100, height=100, xaxis=None, yaxis=None),
opts.Overlay(title="Geographic distribution of code violation"))
To do a census tract comparison to our eviction data, we need to find which census tract each of the code violations falls into. Use the geopandas.sjoin() function to do just that.
Hints
geometry column (specifying census tract polygons) and the GEOID column (specifying the name of each census tract).Next, we'll want to find the number of violations (for each kind) per census tract. You should group the data frame by violation type and census tract name.
The result of this step should be a data frame with three columns: violationdescription, GEOID, and N, where N is the number of violations of that kind in the specified census tract.
Optional: to make prettier plots
Some census tracts won't have any violations, and they won't be included when we do the above calculation. However, there is a trick to set the values for those census tracts to be zero. After you calculate the sizes of each violation/census tract group, you can run:
N = N.unstack(fill_value=0).stack().reset_index(name='N')
where N gives the total size of each of the groups, specified by violation type and census tract name.
See this StackOverflow post for more details.
This part is optional, but will make the resulting maps a bit prettier.
tracts_slc_LViolation = slc_LViolation
tracts_slc_LViolation = gpd.sjoin(tracts_slc_LViolation, geo_evphilly, how='right')
numtract = tracts_slc_LViolation
numtract["N"] = 1
numtract = numtract.groupby(['GEOID','violationdescription'])['N'].sum()
numtract = numtract.unstack(fill_value=0).stack().reset_index(name='N')
numtract.head()
We now have the number of violations of different types per census tract specified as a regular DataFrame. You can now merge it with the census tract geometries (from your eviction data GeoDataFrame) to create a GeoDataFrame.
Hints
pandas.merge() and specify the on keyword to be the column holding census tract names. pandas.merge() function.numtract = geo_evphilly.merge(numtract, on='GEOID')
Now, we can use hvplot() to create an interactive choropleth for each violation type and add a widget to specify different violation types.
Hints
groupby keyword to tell hvplot to make a series of maps, with a widget to select different violation types.dynamic=False as a keyword argument to the hvplot() function. width and height that makes your output map (roughly) square to limit distortionshv.output(widget_location='bottom')
pol1 = numtract.hvplot.polygons(c='N',
frame_width=600,
frame_height=600,
groupby = "violationdescription",
geo=True,
cmap='viridis',
alpha=0.7,
hover_cols='all',
dynamic=False)
combination2 = gvts.Wikipedia * pol1
combination2.opts(
opts.WMTS(width=600, height=600, xaxis=None, yaxis=None),
opts.Overlay(title="Geographic distribution of code violation"))
From the interactive maps of evictions and violations, you should notice a lot of spatial overlap.
As a final step, we'll make a side-by-side comparison to better show the spatial correlations. This will involve a few steps:
hvplot() to make two interactive choropleth maps, one for the data from step 1. and one for the data in step 2.Note: since we selected a single year and violation type, you won't need to use the groupby= keyword here.
evphilly_16 = group_evphilly.loc[(group_evphilly['year']=='e-16')]
specific_numtract = numtract.loc[(numtract['violationdescription']=='VACANT PROPERTIES-GENERAL')]
choro2 = evphilly_16.hvplot(c='evictions',
frame_width=400,
frame_height=600,
alpha=0.7,
geo=True,
cmap='viridis',
hover_cols='all',
dynamic=False,
title="Geo-distribution of eviction in 2016",
fontsize=9
)
pol2 = specific_numtract.hvplot(c='N',
frame_width=400,
frame_height=600,
geo=True,
cmap='Plasma',
alpha=0.7,
hover_cols='all',
dynamic=False,
title="Geo-distribution of the violation - VACANT PROPERTIES-GENERAL",
fontsize=9
)
combination3 = (gvts.Wikipedia * choro2 + gvts.Wikipedia * pol2).cols(2)
combination3
Identify the 20 most common types of violations within the time period of 2012 to 2016 and create a set of interactive choropleths similar to what was done in section 1.2.7.
Use this set of maps to identify 3 types of violations that don't seem to have much spatial overlap with the number of evictions in the City.
top20_vilation = LViolation
top20_vilation['N']=1
top20_vilation = top20_vilation.groupby('violationdescription',as_index=False).sum('N').sort_values(['N'],ascending = False)
top20_vilation = top20_vilation.iloc[0:20]
top20_vilation = top20_vilation.drop(['lat','lng'],axis=1)
top20_vilation = top20_vilation.reset_index(drop=True)
top20 = top20_vilation['violationdescription']
top_2_LViolation = LViolation.loc[LViolation['violationdescription'].isin(top20)]
top_2_LViolation = gpd.sjoin(top_2_LViolation, geo_evphilly, how='right')
top20_numtract = top_2_LViolation
top20_numtract = top20_numtract.groupby(['GEOID','violationdescription'])['N'].sum()
top20_numtract = top20_numtract.unstack(fill_value=0).stack().reset_index(name='N')
top20_numtract = geo_evphilly.merge(top20_numtract, on='GEOID')
total_evphilly = mlt_evphilly
total_evphilly = total_evphilly.groupby('GEOID',as_index=False).sum('evictions')
total_evphilly = total_evphilly.merge(geo_evphilly,on='GEOID')
points_evphilly = total_evphilly
points_evphilly['geometry'] = points_evphilly['geometry'].centroid
points_evphilly.crs =total_evphilly.crs
points_evphilly['eviction2']=points_evphilly['evictions']/3
hv.output(widget_location='bottom')
pol2 = top20_numtract.hvplot.polygons(c='N',
frame_width=600,
frame_height=600,
groupby = "violationdescription",
geo=True,
cmap='viridis',
alpha=1,
hover_cols='all',
dynamic=False)
point3 = points_evphilly.hvplot(
size='eviction2',
line_color = '#38CBE7',
frame_width=600,
frame_height=600,
geo=True,
c='#38CBE7',
alpha=0.4,
dynamic=False)
combination3 = gvts.Wikipedia * pol2 * point3
combination3.opts(
opts.WMTS(width=600, height=600, xaxis=None, yaxis=None),
opts.Overlay(title="Geographic distribution of Top20 code violation"))
are three types of violations that don't seem to have much spatial overlap with the number of evictions in the City.
In this part, we'll explore the NDVI in Philadelphia a bit more. This part will include two parts:
Use rasterio to load the landsat data for Philadelphia (available in the "data/" folder)
landsat = rio.open("./data/landsat8_philly.tif")
Create two polygon objects, one for the city limits and one for the suburbs. To calculate the suburbs polygon, we will take everything outside the city limits but still within the bounding box.
c_limit = gpd.read_file("./data/City_Limits.geojson")
c_limit = c_limit.to_crs(landsat.crs.data['init'])
env = c_limit.envelope
envgdf = gpd.GeoDataFrame(gpd.GeoSeries(env))
envgdf = envgdf.rename(columns={0:'geometry'}).set_geometry('geometry')
envgdf = envgdf.to_crs(landsat.crs.data['init'])
sub_limit = gpd.overlay(envgdf,c_limit, how='difference')
Using the two polygons from the last section, use rasterio's mask functionality to create two masked arrays from the landsat data, one for the city and one for the suburbs.
For each masked array, calculate the NDVI.
city, mask_transform = mask(
dataset=landsat,
shapes=c_limit.geometry,
crop=True,
all_touched=True,
filled=False,
)
suburb, mask_transform = mask(
dataset=landsat,
shapes=sub_limit.geometry,
crop=True,
all_touched=True,
filled=False,
)
c_red = city[3]
c_nir = city[4]
sub_red = suburb[3]
sub_nir = suburb[4]
def calculate_NDVI(nir, red):
"""
Calculate the NDVI from the NIR and red landsat bands
"""
nir = nir.astype(float)
red = red.astype(float)
check = np.logical_and( red.mask == False, nir.mask == False )
ndvi = np.where(check, (nir - red ) / ( nir + red ), np.nan )
return ndvi
city_NDVI = calculate_NDVI(c_nir, c_red)
sub_NDVI = calculate_NDVI(sub_nir, sub_red)
fig, ax = plt.subplots(figsize=(10,10))
landsat_extent = [
landsat.bounds.left,
landsat.bounds.right,
landsat.bounds.bottom,
landsat.bounds.top,
]
img = ax.imshow(city_NDVI,extent=landsat_extent)
c_limit.boundary.plot(ax=ax, edgecolor='gray', facecolor='none', linewidth=4)
plt.colorbar(img)
ax.set_axis_off()
ax.set_title("NDVI in City of Philadelphia", fontsize=18);
fig, ax = plt.subplots(figsize=(10,10)) landsat_extent = [ landsat.bounds.left, landsat.bounds.right, landsat.bounds.bottom, landsat.bounds.top, ] img = ax.imshow(sub_NDVI,extent=landsat_extent) sub_limit.boundary.plot(ax=ax, edgecolor='gray', facecolor='none', linewidth=4)
plt.colorbar(img) ax.set_axis_off() ax.set_title("NDVI in Suburb of Philadelphia", fontsize=18)
nanmedian function will be useful for ignoring NaN elementsmed_sub = np.nanmedian(sub_NDVI)
med_city = np.nanmedian(city_NDVI)
print("median NDVI within the city:",med_city," median NDVI within the suburbs:",med_sub)
c_stats = zonal_stats( c_limit, city_NDVI, affine=landsat.transform, stats=['median'])
sub_stats = zonal_stats( sub_limit, sub_NDVI, affine=landsat.transform, stats=['median']) print(c_stats,sub_stats)
The data is available in the "data/" folder. It has been downloaded from OpenDataPhilly. It contains the locations of abot 2,500 street trees in Philadelphia.
st_trees = gpd.read_file("./data/ppr_tree_canopy_points_2015.geojson")
st_trees = st_trees.to_crs(landsat.crs.data['init'])
tree_NDVI = zonal_stats(
st_trees,
city_NDVI,
affine=landsat.transform,
stats=['mean'])
st_trees['NDVI'] = [s['mean'] for s in tree_NDVI]
st_trees.head()
Make two plots of the results:
hist function. Include a vertical line that marks the NDVI = 0 thresholdplot function. Include the city limits boundary on your plot.The figures should be clear and well-styled, with for example, labels for axes, legends, and clear color choices.
fig, ax = plt.subplots(figsize=(8,6))
plt.grid(color='#A8A8A8', lw=0.5,linestyle='dashed')
ax.hist(st_trees['NDVI'], bins=50,color='#47614A')
ax.axvline(x=0, c='k', lw=2,linestyle='dashed')
ax.set_xlabel("Tree NDVI", fontsize=18)
ax.set_ylabel("Number of Trees", fontsize=18);
ax.set_title("Tree NDVI in Philadelphia", fontsize=18);
geo_philly = geo_evphilly
geo_philly = geo_philly.to_crs(c_limit.crs)
fig, ax = plt.subplots(figsize=(10,10))
divider = make_axes_locatable(ax)
cax = divider.append_axes("right", size="5%", pad=0.1)
c_limit.plot(ax=ax, edgecolor='black', facecolor='none', linewidth=4)
geo_philly.plot(ax=ax, edgecolor='black', facecolor='none', linewidth=0.1)
st_trees.plot(
column='NDVI',
legend=True,
ax=ax,
cax=cax,
cmap='OrRd',
s = 55,
alpha = 0.5)
cx.add_basemap(ax,
zoom=12,
crs=st_trees.crs)
plt.title("Brooklyn",fontsize=10)
ax.set_title("Tree NDVI in Philadelphia EPSG:32618", fontsize=18);